Skip to content

Conversation

@popcornmix
Copy link
Collaborator

Some apps like linpack call numa_setpolicy to disable numa, but that tends to have a significant performance hit for us.

Add an option to force the default numa mode (and so ignore policy changes) for now.

Consider making this the default behaviour in the future.

Use mempolicy.force_numa=1 in cmdline.txt or
echo 1 | sudo tee /sys/module/mempolicy/parameters/force_numa

to force the default numa policy.

Copy link
Contributor

@pelwell pelwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either I've misunderstood or the commit message seems backwards. Where it says "force the default numa mode (and so ignore policy changes)", it looks more like "ignore policy changes (and so force the default numa mode)", although the "default" is MPOL_INTERLEAVE, is not MPOL_DEFAULT.

mm/mempolicy.c Outdated

#include "internal.h"

/* bit field to force hotplug detection. bit0 = HDMI0 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - updated comment.

@popcornmix
Copy link
Collaborator Author

It prevents run-time switching of the mempolicy, so the default policy chosen through cmdline.txt (which we set to numa_policy=interleave) persists forever.

I want to prevent changes using numactl (and similar) userside apis.

I'm happy to switch to a preferred wording.

@pelwell
Copy link
Contributor

pelwell commented Mar 25, 2025

How about:

Add an option to preserve the default numa mode by ignoring
policy changes for now.

@popcornmix
Copy link
Collaborator Author

An alternative would be to treat any numa_policy= in cmdline.txt as a reason to ignore calls to kernel_set_mempolicy so effectively numa_force= is not required (assumed true when numa_policy= exists.

I believe that is the behaviour that makes sense for most users (if bootloader expects numa to be enabled and so sets BANKLOW=1, and we disable numa, performance is bad).

@pelwell
Copy link
Contributor

pelwell commented Mar 25, 2025

I think I prefer that logic.

@popcornmix
Copy link
Collaborator Author

Updated. Slightly annoying we do get some messages from a normal boot:

[    5.475404] mempolicy: Request to set policy ignored (mode:5 lmode:5 flags:0 err:0)
[    5.475603] mempolicy: Request to set policy ignored (mode:3 lmode:3 flags:0 err:0)
[    5.476811] mempolicy: Request to set policy ignored (mode:5 lmode:5 flags:0 err:0)
[    5.476981] mempolicy: Request to set policy ignored (mode:3 lmode:3 flags:0 err:0)
[   18.969939] mempolicy: Request to set policy ignored (mode:5 lmode:5 flags:0 err:0)
[   18.970114] mempolicy: Request to set policy ignored (mode:3 lmode:3 flags:0 err:0)

5=MPOL_PREFERRED_MANY 3=MPOL_INTERLEAVE.
I'm not sure who is requesting MPOL_PREFERRED_MANY (a backtrace just shows it comes from invoke_syscall, so userspace). I'm guessing systemd.

Ignoring these is reasonable, but the spam is unfortunate.
I'm happy to remove the debug, but there is some value in the messages if it occurs later.

@pelwell
Copy link
Contributor

pelwell commented Mar 25, 2025

Short of adding a time check, or perhaps some check on the calling process, I don't think there's much we can do about that.

@popcornmix
Copy link
Collaborator Author

Options are:
disable messages
gate the messages on a cmdline parameter, defaulted to off.
live with the spam

Any preference?

@pelwell
Copy link
Contributor

pelwell commented Mar 25, 2025

How about muting them for the first 30-40 seconds? That would get us past the boot.

@popcornmix popcornmix force-pushed the force_numa branch 2 times, most recently from 83fdcdd to 5dcc6a4 Compare March 25, 2025 20:40
@popcornmix
Copy link
Collaborator Author

Okay, gone for the "ignore first 40 seconds" of boot option.

Some apps like linpack use numa_setpolicy to disable numa,
but that tends to have a significant performance hit for us.

If you have a cmdline.txt setting of numa_policy (to something other
than default), then lets ignore runtime changes and stick with
the cmdline.txt setting.

Not specifying numa_setpolicy in cmdline, or setting
numa_setpolicy=default(*) will allow runtime settings to work.

(*) easier to do when numa_setpolicy=interleave is set in DT.

Ignore logging for the first 40 seconds as there are some
expected switches during boot.

Signed-off-by: Dom Cobley <[email protected]>
@popcornmix
Copy link
Collaborator Author

I've reduced the debug message a little (I think only the mode set is actually interesting).

@pelwell pelwell merged commit 550a46e into raspberrypi:rpi-6.12.y Mar 26, 2025
13 checks passed
@popcornmix popcornmix deleted the force_numa branch March 26, 2025 12:47
@popcornmix popcornmix changed the title mm/mempolicy: Add override to force numa policy mm/mempolicy: Ignore runtime policy changes when set through cmdline Mar 26, 2025
popcornmix added a commit to raspberrypi/firmware that referenced this pull request Mar 26, 2025
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants